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CLOSED-FORM LIKELIHOOD EXPANSIONS 
FOR MULTIVARIATE DIFFUSIONS 

oo 

o 
o 

C*H, This paper provides closed-form expansions for the log-likehhood 

■^C ' function of multivariate diffusions sampled at discrete time intervals. 

The coefficients of the expansion are calculated explicitly by exploit- 
ing the special structure afforded by the diffusion model. Examples of 
interest in financial statistics and Monte Carlo evidence are included, 
t-H I along with the convergence of the expansion to the true likelihood 

^Q . function. 

^' 

"t^ ' 1. Introduction. Diffusions and, more generally, continuous-time Markov 

processes are generally specified in economics and finance by their evolution 
over infinitesimal instants, that is, by writing down the stochastic differ- 
ential equation followed by the state vector. However, for most estimation 
^ . strategies relying on discretely sampled data, we need to be able to infer the 

OO I implications of the infinitesimal time evolution of the process for the finite 

J;£^ ' time intervals at which the process is actually sampled, say daily or weekly. 

The transition function plays a key role in that context. Unfortunately, the 
^^ transition function is, in most cases, unknown, 

f— N ■ At the same time, continuous-time models in finance, which until re- 

00 ! cently have been largely univariate, now predominantly include multiple 

^^ ' state variables. Typical examples include asset pricing models with multiple 

^ . explanatory factors, term structure models with multiple yields or factors 

'k>( I and stochastic volatility or stochastic mean reversion models (see Sundare- 

san [28] for a recent survey). Motivated by this trend and the need for 



effective representation methods, I construct closed- form expansions for the 
log-transition function of a large class of multivariate diffusions. Because 
diffusions are Markov processes, the log-likelihood function of observations 
from such a process sampled at finite time intervals reduces to the sum of 
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2 Y. AIT-SAHALIA 

the log-transition function of successive pairs of observations. A closed form 
expansion for the latter therefore makes quasi-likelihood inference feasible 
for these models. 

The paper is organized as follows. Section 2 sets out the model and as- 
sumptions. In Section 3, I introduce the concept of reducibility of a diffusion 
and provide a necessary and sufficient condition for the reducibility of a mul- 
tivariate diffusion. In an earlier work (Ai't-Sahalia [2]), I constructed explicit 
expansions for the transition function of univariate diffusions based on Her- 
mite series. The natural extension of the Hermite method to the multivari- 
ate case is applicable only if the diffusion is reducible, which all univariate, 
but few multivariate, diffusions are. So, this paper proposes an alternative 
method, which determines the coefficients in closed form by requiring that 
the expansion satisfies the Kolmogorov equations describing the evolution 
of the process up to the order of the expansion itself. When a diffusion is re- 
ducible, the coefficients of the expansion are obtained as a series in the time 
variable, which I show in Section 4. When the diffusion is not reducible, the 
expansion involves a double series in the time and state variables, described 
in Section 5. Section 6 then studies the convergence of the likelihood ex- 
pansion and the resulting maximizer to the theoretical (but incomputable) 
maximum likelihood estimator. Section 7 contains examples of multivariate 
diffusions and Monte Carlo simulation results. Proofs are in Section 8 and 
Section 9 concludes the paper. 

2. Setup and assumptions. Consider the multivariate time-homogenous 
diffusion 

(1) dXt = fiiXt)dt + aiXt)dWt, 

where Xt and fJ-iXt) are mxl vectors, cr{Xt) is an mx m matrix and Wt is 
an m X 1 vector of independent Brownian motions. Independence is without 
loss of generality since arbitrary correlation structures between the shocks to 
the different equations can be modeled through the inclusion of off-diagonal 
terms in the a matrix, which, furthermore, need not be symmetric. In time- 
inhomogeneous diffusions, the coefficients are allowed to depend on time 
directly, as in fi{Xt,t) and a{Xt,t), beyond their dependence on time via 
the state vector. The time-inhomogeneous case can be reduced to the time- 
homogenous case by treating time as an additional state variable and so it 
suffices to return to the model specified in (1). 

The objective of this paper is to derive closed-form approximations to the 
log of the transition function px{x\xo,A), that is, the conditional density 
of Xt-|_A = X given Xt = xq induced by the model (1). From an inference 
perspective, the primary use of this construction is to make feasible the 
computation of the MLE. Assume that we parametrize (/x, a) as functions 
of a parameter vector and observe X at dates {t = iA\i = 0,. . . ,n}, where 
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A > is fixed. The Markovian nature of (1), wliich tlie discrete data inherit, 
implies that the log-hkelihood function has the simple form 

n 

(2) £„(^,A)^^/x(X,a|X(,_i)a,A), 

where Ix = Inpx and where the asymptotically irrelevant density of the 
initial observation, Xq, has been left out. In practice, the issue is that for 
most models of interest, the function px-, hence Ix-, is not available in closed 
form. 

I will use the following notation. Let Sx-, a subset of M"*, denote the 
domain of the diffusion X, assumed, for simplicity, to be of the following 
form. 

Assumption 1. Sx is a product of m intervals with limits Xj and Xj, 
where possibly Xj = — oo and/or Xi = +cx), in which case, the intervals are 
open at infinite limits. 

I will use to denote transposition and, for a function r]{x) = (771(2;), . . . , 
Vdix)) , differentiable in x, I will write Vr/(x) for the Jacobian matrix of 
rj, that is, the matrix Vr/(x) = [dr]i{x)/dxj]i=i^^^^^d;j=i,...,m- For x gM'", |Ix|| 
denotes the usual Euclidean norm. If a = [aij]ij=i^,„^rn is an m x m invert- 
ible matrix, then I write a~^ for the matrix inverse, with elements [a~^]ij. 
Det[a] and tr[a] denote the determinant of a and its trace, respectively. 
If a = [aj]j=i,...,m is a vector, tr[a] denotes the sum of the elements of a. 
a = diag[aj]j=i^...^m denotes the m x m diagonal matrix with diagonal ele- 
ments Oj. When a function r]{x) is invertible in x, I write ??"^^(y) for its 
inverse. 

In some instances, it may be more natural to directly parametrize the 
infinitesimal variance-covariance matrix of the process 

(3) v{x) = (t(x)(t (x) 

than o"(x) itself. Every characterization of the process, such as its transition 
probability, depends, in fact, on {n,v). In particular, it can be shown that, 
should there exist a continuum of solutions in a to equation (3), then the 
transition probability of the process is identical for each of these a (see 
Remark 5.1.7 and Section 5.3 in Stroock and Varadhan [27]). This is also 
quite clear from inspection of the infinitesimal generator Ax of the process, 
which depends on v rather than a. For functions /(A,x) that are suitably 
differentiable on its domain. Ax has the action 

dfix,A) , ^ ^ , 9/(x,A) , 1 ^ , , d^f{x,A) 

4 = 1 * «J = 1 ■' 
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The domain of Ax includes at least those functions that, for each xq € 
Sx, are once continuously differentiable in A in R+, twice continuously 
differ entiable in x G Sx and have compact support. 

As this will play a role in the likelihood expansions, define 

(5) D^{x) = ^ln(Det[v{x)]). 

I will assume that this matrix v satisfies the following regularity condition: 

Assumption 2. The matrix v{x) is positive definite for all x in the 
interior of Sx ■ 

Further assumptions are required to ensure the existence and uniqueness 
of a solution to (1) and to make the computation of likelihood expansions 
possible. I will assume the following. 

Assumption 3. /^(x) and (t{x) are infinitely differentiable in x on Sx- 

Assumption 3 ensures the uniqueness of solutions to (1). Indeed, Assump- 
tion 3 implies in particular, that the coefficients of the stochastic differential 
equation are locally Lipschitz under their assumed (once) differentiability, 
which can be seen by applying the mean value theorem. This ensures that a 
solution, if it exists, will be unique (see, e.g.. Theorem 5.2.5 in Karatzas and 
Shreve [21]). The infinite differentiability assumption in x is unnecessary for 
that purpose, but it allows the computation of expansions of the transition 
density, which, as we will see, involves repeated differentiation of the co- 
efficient functions fx and a. There exist models of interest in finance, such 
as Feller's square root diffusion used in the Cox, Ingersoll and Ross model 
of the term structure, that fail to satisfy the Lipschitz condition since they 
violate the differentiability requirement of Assumption 3 at a boundary of 
Sx'- for instance, <t{x) = a^x^'"^ is not differentiable at the left boundary 
of Sx- It is then possible to weaken Assumption 3 to cover such cases (see 
Watanabe and Yamada [30] and Yamada and Watanabe [32]). 

The next assumption restricts the growth behavior of the coefficients near 
the boundaries of the domain. 

Assumption 4. The drift and diffusion functions satisfy linear growth 
conditions, that is, there exists a constant K such that for all x G Sx and 

(6) \ni{x)\<K{l + \\x\\) and \a^j{x)\ < K{1 + \\x\\). 
Their derivatives exhibit at most polynomial growth. 
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The role of Assumption 4 is to ensure the existence of a solution to the 
stochastic differential equation (1) by preventing explosions of the process 
in finite expected time. While it can be relaxed in specific examples, it is 
not possible to do so in full generality. In dimension one, however, finer re- 
sults are available (see the Engelbert-Schmidt criterion in Theorem 5.5.15 
of Karatzas and Shreve [21]), allowing linear growth to be imposed only 
when the drift coefficient pulls the process toward an infinity boundary (see 
Proposition 1 of Ai't-Sahalia [2]). In all dimensions, the linear growth con- 
dition in Assumption 4 is only an issue near the boundaries of Sx- In the 
special case where Sx is compact, the growth condition (boundedness, in 
fact) follows from the continuity of the functions. The additional assumption 
that the derivatives of the drift and diffusion functions grow at most poly- 
nomially simplifies matters in light of the exponential tails of the transition 
density px ■ 

Finally, the diffusion process X is fully defined by the specification of 
the functions /u and a and its behavior at the boundaries of Sx- In many 
examples, the specification of fi and a predetermines the boundary behav- 
ior of the process, but this will not be the case for models that represent 
limiting situations. For instance, in Cox, Ingersoll and Ross processes with 
affine /i and v, the behavior at the boundary depends upon the values 
of the parameters. When this situation occurs for a particular model, the 
behavior of the likelihood expansion near such a boundary will be specified 
exogenously to match that of the assumed model. 

3. Reducible difTusions. Whenever possible, I will first transform the dif- 
fusion X into one that is more amenable to the derivation of an expansion for 
its transition density. For that purpose, I introduce the following definition. 



Definition 1 (Reducibility) . The diffusion X is said to be reducible to 
unit diffusion (or reducible, in short) if and if only if there exists a one- 
to-one transformation of the diffusion X into a diffusion Y whose diffusion 
matrix ay is the identity matrix. That is, there exists an invertible function 
7(x), infinitely differentiable in X on Sx, such that Yf = "y{Xt) satisfies the 
stochastic differential equation 

(7) dYt = fiYiYt)dt + dWt 
on the domain Sy- 

By Ito's lemma, when the diffusion is reducible, the change of variable 
7 satisfies S/'y{x) = a~^{x). Every scalar (i.e., one-dimensional) diffusion is 
reducible, by means of the simple transformation 

(8) y* = 7(X,) ' 



cj(u)' 
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where the lower bound of integration is an arbitrary point in the interior of 
Sx- The differentiabihty of 7 ensures that ^y satisfies Assumption 3. This 
change of variable, known as the Lamperti transform, played a critical role 
in the derivation of closed-form Hermite approximations to the transition 
density of univariate diffusions in Ai't-Sahalia [2]. How to deal with the 
case where l/a{u) cannot be integrated in closed form is discussed after 
Proposition 2 below. Whenever a diffusion is reducible, an expansion can be 
computed for the transition density px of X by first computing it for the 
density py of the reduced process Y and then transforming Y back into X, 
essentially proceeding by extending the univariate method. 

However, not every multivariate diffusion is reducible. Whether or not a 
given multivariate diffusion is reducible depends on the specification of its 
o" matrix, in the following way. 

Proposition 1 (Necessary and sufficient condition for reducibility) . The 
diffusion X is reducible if and only if 

(9) E^;^-o(-)=E^^-/^(-) 

for each x in Sx end triplet {i,j, k) = 1, . . . ,m such that k > j. If a is non- 
singular, then the condition can he expressed as 



(10) 



d\a \i{x) _ d{a \k{x) 
dxk dxj 



Similar restrictions on the a matrix arise in different contexts; see Doss 
[11] who studied the question of when the solution X of the SDE can be 
expressed as a function of the Brownian motion W and the solution of an 
ODE and the concept of "commutative noise" in Section 10.6 of Cyganowski, 
Kloeden and Ombach [8] where they show that restricting the a matrix leads 
to a simplification of the Milshtein scheme for X . 

In the bivariate case m = 2, condition (10) reduces to 

d[a-^]u{x) d[a-']i2{x) ^ d[a-']2iix) d[a-%2{x) ^^ 
dx2 dxi dx2 dxi 

For instance, consider diagonal systems: if 0"i2 = cr2i = 0, then the reducibil- 
ity condition becomes d[a~^]ii/dx2 = d[a~^]22/dxi = 0. Since [cr""'^]^ = l/au 
in the diagonal case, reducibility is equivalent to the fact that an depends 
only on Xj for each i = 1,2. This is true more generally in dimension m. Note 
that this is not the case if off-diagonal elements are present. Another set of 
examples is provided by the class of stochastic volatility models. Consider 
the two models where either 

^(^)-f ^11(^2) ^ nr rrrr^-f^^^^l) a{xi)b{x2)\ 

^ ^"l a22{x2)J ^W-(^ c{x2) )■ 
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In the first model, the process is not reducible in light of the previous diago- 
nal example, as this is a diagonal system where an depends on X2- However, 
in the second, the process is reducible. 

4. Closed-form expansion for the likelihood function of a reducible diffu- 
sion. When the diffusion is reducible, I propose two approaches to construct 
a sequence of explicit expansions for the log-likelihood function. The first is 
based on computing the coefficients of a Hermite expansion for the density 
of the transformed process, py- The coefficients are found in the form of a 
series expansion in A, the time separating successive observations. 

The second approach takes the form of the Hermite series and determines 
its coefficients by solving the Kolmogorov partial differential equations which 
characterize the transition function py- In both cases, given a series for py, I 
obtain a series for the original object of interest, px, by reversing the change 
of variable and the Jacobian formula. The two approaches yield the same 
final series. 

4.1. Multivariate Hermite expansions. To motivate the form of the ex- 
pansion that I will propose in the multivariate case, in both the reducible and 
irreducible cases, consider the following natural multivariate counterpart to 
the univariate Hermite expansion of Ait-Sahalia [2]. Let (j){x) denote the 
density of the TTi-dimensional multivariate Normal distribution with mean 
zero and identity covariance matrix. For each vector h= [hi, . . . ,hm) G 
N"^, recall that ti[[h] = hi + ■ ■ ■ + hm and let Hh{x) denote the associ- 
ated Hermite polynomials, which are defined by Hfi{x) = {{—lY^^^'/(j){x)) 
d^^^ '(l){x)/dxi^ ■ ■ ■ dx^ and can be computed explicitly to an arbitrary order 
ti[h] (see, e.g.. Chapter 5 of McCullagh [23] or Withers [31]). The polynomi- 
als are orthonormal in the sense that J^rn Hh{x)Hk{x)(f){x) dx = h\\ ■ ■ ■ /im,! if 
h = k and otherwise. 

The Hermite series approximation of py is in the form 

(11) p^^\y\yo,A) = A-/^Jy—^) ^ ,,(A,yo)i^.(^) 

and the Hermite coefficients rjh{A,yQ) can be computed as in the univariate 
case: by orthonormality of the Hermite polynomials, the coefficients rjh are 
given by the conditional expectation 

(12) Vh{A,yo) = , , \ E[Hh{A-'/\Yt+A - yo))\Yt = yo]. 

This expression is then amenable to computing an expansion in A using 
the generator (4). To evaluate that conditional expectation, the deterministic 
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Taylor expansion 

EYAf{yA,Yo,A)\Yo = yo] 
(13) 

k=0 

can be used, where Ay is the infinitesimal generator of the process Y, the 
function / is sufficiently differentiable in (y, 5) and its iterates by application 
of Ay up to K times remain in the domain of Ay, as in Ait-Sahalia [2]. The 
result will be a "small-time" expansion, in the same spirit as in Azencott [4] 
and Dacunha-Castelle and Florens-Zmirou [9], except that the expansions 
given here are in closed form instead of relying on moments of functionals 
of Brownian bridges (which are to be computed by simulation). Replacing 
the unknown rjh in (11) by their expansions in A to order K gives rises to 

an expansion of py where the coefficients are gathered in increasing powers 

of A, which I denote py' . If we gather the terms in the right-hand side 

of (11) according to powers of A, we can rewrite py' in the form of a 
truncated series in A, 



(14) #^)(y|yO, A) = A-'»/2</,(A-V2(y _ y^)) ^ c'^^'^) ^y\y^) 

k=o '^■ 

For the log-transition density and for any given J, or in the univariate 
case where the convergence of the Hermite series is established as J — *• oo, 
the resulting expansion has the form 

^lt;\ ;W/ I A\ "^1 ^o A^ , ^Y \y\yo) , V^^(fc)/ I ^A'' 

(15) I'y '{y\yo,A) = -—ln{27rA)+ ^ + 2^Cl.\y\yo) — , 

whose coefficients Cy , k = —1,0,1,2, ... ,K, are combinations of the coef- 
ficients of (11) obtained by identifying the terms in the expansion in A of 
the log of (14). 

The method just described is the natural extension to the multivariate set- 
ting of the Hermite approach employed in the univariate case in Ait-Sahalia 
[2] . Extensions of the univariate Hermite expansion results in two different 
univariate directions have been recently developed for time-inhomogeneous 
diffusions (Egorov, Li and Hu [13]) and for models driven by Levy processes 
other than Brownian motion (Schaumburg [25] and Yu [33]). DiPietro [10] 
has extended the methodology to make it applicable in a Bayesian setting. 
The Hermite method requires, however, that the diffusion be reducible since 
the straight Hermite expansion will not in general converge if applied to px 
directly instead of py. And as discussed above, while all univariate diffu- 
sions are reducible, so that such a Y exists, not all multivariate diffusions 
are. This necessitates an alternative method, which I now develop. 
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4.2. Connection to the Kolmogorov equations. An alternative method to 
obtain an explicit expansion for ly is to take inspiration from the form of the 
solution given by the expansion (15) and to use the Kolmogorov equations 
to determine its coefficients, without any further reference to the Hermite 
expansion. As is often the case when a differential operator is involved, it is 
easier to verify that a given functional form, in this case the expansion in 
the form (15), is the right solution. 

Consider the forward and backward Kolmogorov equations (see, e.g.. Sec- 
tion 5.1 of Karatzas and Shreve [21]) 

gpy(j/||/o,A) _ ^ d{^iYi{y)PY{y\y(i,^)} , l^ 9^py(j/|yo,A) 
^ ^ 5A - ^ % +2^ dy^ ' 

.,„. 5py(y|yo,A) ^ apY(y|yo,A) I ^ d'^pY{y\vQ,l^) 

^''^ dA =g^'-(^o) 9y,^ +2g dyl ■ 

The solution py inherits the smoothness in (A,y,yo) of the coefficients fiy 
(see, e.g.. Section 9.6 in Friedman [14]), so we are entitled to look for an 
approximate solution in the form of a smooth expansion. The fact that the 
Hermite expansion turns out to have exactly the right form for solving the 
forward and backward equations term by term is an interesting feature of 
these expansions. 

Focusing for now on the forward equation (16), the equivalent form for 
the log-likelihood ly (which is the object of interest for MLE and which will 
turn out to lead to a simple linear system) is 

dly{y\yo,A) ^dfiyi{y) ^ dly{y\yo,A) 

8A =-g^^-|:/^-(^) 9y^ 

(18) 

ly. d^ly{y\yo,A) l^f dly{y\yo,A) 

+ 2^ dyf 2,^1 % 

Suppose that we substitute the postulated form of the solution (15) into 
(18). Since 

a/f)(y|yo,A)_ C^y-'\y\yo) m /^^ .fc),..,.. ^ A^'"' 



^ + Y.c'y'iy\yo) 



dA A2 2A ^^ ^ '^'^'''(A;-l)! 

dl'y^\y\yo,A) _ l dC^~'\y\yo) ^ ^ dCp\y\yo) A>^ 
dyi A dyi ^ dyi k\ 

dn^^\y\y,,A) ^ I d^c'f'\y\yo) ^ d^c'y'\y\yo) A^ 
dyi A dyi ^l^^ dyi fc! ' 
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equating the coefficients of A ^ on both sides of (18) iniphes that the leading 

, must solve the nonli: 



coefficient in the expansion, Cy , must solve the nonlinear equation 



Because the density must approximate a Gaussian density as A — >■ 0, the 
appropriate solution is the one with a strict maximum at y = yo, namely 



(20) 4 '\y\yo) = -^V " Vof = -^T^iVi - yo 



m 

2 



i=l 



Considering now the coefficients of A ^ on both sides of (18), we see that 
^ dC^\y\yo) ™ 

2^ ^ [yi - yoi) = 2^f^Yi{y){yi - yoi)- 

Integrating along a line segment between yo and y, we obtain 



(21) 



ra „\ 

Cy (ybo) = y](yi -yoi) / fJ^niyo + uiv - yo)) du, 



with integration constants determined in the proof of the theorem below 
using boundary conditions and the backward equation. The higher-order 
coefficients are obtained using the same principle, and we have the following 
result. 

Theorem 1. The coefficients of the log-density expansion ly {y\yQ,/S.) 
are given explicitly by (20), (21) and, for k>l, 

(22) ciy'\y\yo) = k f' G^^ {yo + u{y - yo)\yo)u''-^ du. 

Jo 

The functions Gy are given by 

r^(i)^ I ^ x^dfiYiiy) ^ , dGP{y\yQ) 



i=i ^yi 7^1 ^yi 

23 



1 - ra^cHyb)^ 

2 it I dy^ 



dC'^>{y\yo) 



dyi 



and, for k>2, 



^w, I , ^ , , dGt'\y\yo) , ^^d'ct'\y\yo) 
G'y'{y\yo) = - l^i^Yiiy) tt- + o 2^ ■ 



,=1 ^y^ ' 2fr[ dyf 

(24) 

ly^gr^-n dc!f\y\yo) dGt'-'\y\yo) 



'^tlto^ ^ ^ ^y^ ^y^ 
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Theorem 1 provides the exphcit form of ly that solves the Kolmogorov 

equations to the desired order A . This does not necessarily imply that ly 
is a proper Taylor expansion of ly at the desired order A ; this will be 
established as part of Theorem 3 below. 

4.3. Change of variable. Given ly, the expression for Ix is given by the 
Jacobian formula 

lxix\xo,A) = -^ln(Bet[v{x)])+ly{A,-f{x)\j{xo)) 
(25) 

= -D^{x) + ly{A,j{x)\j{xo)), 

which I mimic at the level of the approximations of order K in A, thereby 
defining l)^ as 

l^Pix\xo,A) = -D,ix)+l^^\A,^ixMxo)) 

Tfl 

(26) =--ln(27rA)-L)^(x) 

fc=0 

from ly given in (15), using the coefficients Cy^k = — 1, 0, . . . , -fC, given in 

Theorem 1. By construction, l^ solves the Kolmogorov equations for X at 
the same order. 

5. Closed-form expansion for the log-likelihood function of an irreducible 
diffusion. In the reducible case, the two approaches (Hermite and solution 
of the Kolmogorov equations) coincide. When the diffusion is irreducible, 
however, one no longer has the option of first transforming X to Y, com- 
puting the Hermite expansion for Y and then, via the Jacobian formula, 
transforming it into an expansion for X. But, it remains possible to postu- 
late an appropriate form of an expansion for Ix and then to determine that 
its coefficients satisfy the Kolmogorov equations to the relevant order, as 
follows. 

Mimicking the form of the expansion in A obtained in the reducible case, 
namely (26), leads to the postulation of the following form for an expansion 
of the log-likelihood 

l^P {x\xq, A) = -— ln(2^A) - D^{x) 

(27) ^ 

C^^\x\x^ ^ ,^) A^ 
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and solving for the coefficients using the Kolmogorov equations. The expan- 
sion exists because the log-transition function inherits the smoothness of the 
coefficients {fJ,,v) (see, e.g., Section 9.6 of Friedman [14]). 

When written directly for the X process, as required in the irreducible 
case, the equations take the form 

dlxix\xo,A) _ y> dfii{x) 1 y> d'^Vijjx) 

dA ~ ^^ dxi 2 .^^ dxidxi 
1=1 * «j=i -' 



H/^*( 



X 



dlx{x\xo,A) 



^ dvij{x)dlx{x\xo,A) 

ij=i * J 

^1 ^ , ^ d^lx{x\xo,A) 
H — > ViAx) — 

jj=l * 3 

1 -A dlxix\xo,A) dlx{x\xo,A) 

+ o Z^ 1^ %(^)" 



2 .^-^, dxj ^■' dxi 

jj=i * J 

1 ^ dlx{x\xo,A) dlx{x\xo,A) 

The solution method is as follows: as in the reducible case, substituting the 
postulated solution (27) into (28) provides an equation at each order in A 
which is solved for the corresponding coefficient of the expansion. While the 
differential equation for Ix is nonlinear, it can be transformed into a linear 
one by exponentiation and so the expansion r^ constructed in this way 
will approximate Ix- 

Start with the equation of order A^^ which determines the leading order 
coefficient Cj^ . While the leading coefficient Cj^ in the case of a reducible 
diffusion is simply C^ {x\xo) = — ||7(x) — 7(xo)|p/2, the situation is more 
involved when the diffusion X is not reducible. The equation that determines 
the coefficient C^ is obtained by equating the terms of order A~^ in (28), 



LIKELIHOOD FOR DIFFUSIONS 13 

yielding 

(30) 4 '(-M=-H "L ' ) -w( %. )■ 

The solution of this equation is not explicit in general, although it has a 
nice geometric interpretation as minus one half the square of the shortest 
distance from x to xq in the metric induced in M"^ by the matrix v(x)~^ 
(see [29]). 

5.1. Time and state expansion. The analysis of the coefficient C^ sug- 
gests that it will generally be impossible to explicitly characterize the coef- 
ficients of the expansion (27) since (30) will not in general admit an explicit 
solution. This is where the next step in the analysis comes into play. The 
idea now is to derive an explicit approximation in (x — xq) of the coefficients 

(k) 

Cj^ (x\xq), k = —1,0, ... ,K. In other words, I localize the log-likelihood 
function in both A and x — xo- The key difference between what can be 
done in the reducible special case of Theorem 1 and in the general case of 
Theorem 2 is that the coefficients of the expansion in A can be obtained 
directly by (20)-(22) with no need for an expansion in the state variable. 
How this works can be seen by once again considering the coefficient Cj^ 
Consider a quadratic [in {x — xq)] approximation of the solution to the equa- 
tion (30) determining C)^ . The constant and linear terms are necessarily 
zero since the matrix v{x) is nonsingular. Write the second-order expansion 

as Cx {x\xo) = -(l/2)(x - xo)'^V{x - xq) + o{\\x - xo||^). Equation (30) 
implies the equation V = Vv{xo)V, whose solution is ^ = v~^{xo). 

As a consequence, the leading term coming from the expansion of C^ {x\xq) 
in X — Xo is — (l/2A)(x — xq) v{xo)~^{x — xq) so that the leading term in 
the expansion for the log-density will correspond to that of a Normal with 
mean xq and covariance matrix Av{xq). 

(k) 

More generally, I will derive a series in (x — xq) for each coefficient Cj^ , 

at some order jfc in (x — xq). That expansion is to be denoted by C^' ■ One 
remaining question is the choice of the order j^ [in (x — xq)] corresponding 
to a given order k (in A). For that purpose, note that X/\ — Xq = Op{A^''^), 
so 

\cP{XA\Xo)A''-di'''''\xA\Xo)A''\ = Op{\\XA-Xor'A'') 

(31) 

= 0j,(A-''*/2+fc) 

and therefore setting jkf^ + k = K + 1, that is, 
(32) jk = 2iK + l-k), 
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for k = —1,0, . . . ,K, will provide an approximation error due to the expan- 
sion in {x — xq) of the same order A "'"^ for each of the terms in the series 
(27). 

The resulting expansion will then be of the form 



(33) 



5f^(x|xo,A) = -^ln(27rA)-D,(x) 



fe=0 



This double expansion [in A and in (x — xq)] can be viewed, in probability, 
as an expansion in A only once the process is inserted in the likelihood, in 
light of (31). In general, the function need not be analytic at A = 0, hence 
this should be interpreted strictly as a series expansion. 

Finally, note that the term D^{x) which arises in the reducible case from 
the Jacobian transformation is independent of A and so could be built into 

the Cx coefficient. Doing so, however, would subject it to being expanded 
in X — Xq, which is unnecessary since Dy{x) is known. If Dy(x) were being 

expanded along with C^' , we would lose the property that l^ also solves 
the backward equation (29) to the corresponding order in A. 



5.2. Determination of the coefficients in the irreducible case. What re- 
mains to be done is to explicitly compute the expansion Cj^' in x — xq 

of each coefficient Cj^ ■ Let i = {ii,i2, ■ ■ ■ ,im) denote a vector of integers 
and define Ij. = {i = {ii,i2, ■ ■ ■ , im) £ N"* : < tr[i] < j^} so that the form of 
Cf'''^ is 

C^''='')(x|xo) =Y.-^j Pt\^0){xi - Xoit{x2 - Xo^r ■■■{Xm- XOmT'^ . 

(34) 

The coefficients are determined one by one, starting with the leading term 

Cx^'~ ■ Given C^^'' , the next term C^' is calculated exphcitly, and 
so on. Based on (32), the highest-order term (k = —1) is expanded to a higher 
degree of precision j_i than the successive terms. This is quite natural, given 

that C^^' is an input to the differential equation determining C^' , 
and so on. In order to state the main result pertaining to the closed- form 

solutions C 
derivatives: 



solutions C^ ,1 define the following functions of the coefficients and their 



Gy{x\xo) = — -}_^iii{x) 



=1 



dxi 



d vij{x) dC^x ^\x\xo) 
dxi dx-i 



+ E 
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(35) 






Lj=l ^-^^ ^-^3 



Yl ^ii( 



ij=i dxi dxj 

G^ (x|xo) - - 2^ ^^ + - 2.^^-^ 

^ ^54°)(x|xo) aD,(x) 



(36) 



1=1 ^ ' ' 

2 -T^T ^"^ 1 dxidxj dxidxj 

^ / dcf{x\xQ) dP.jx) 

\ dxi dxi 

dcP{x\xo) dP^jx) 
dxj dxj 



and, for k>2, 



Ux [x\xo) -- 2_^tJ-i[x) — h 2^ 



. , dxi .'^^, dxi dxj 

1=1 * 2J = 1 -^ 



(37) 



1 1 f , , fcr''{x\xo) 



m r /;^r'(o),'^u \ ^n /'^^\ /^/--(^-i) 



+ 9 H v^j{x)\ 



dC'^>{x\xQ) ^dP„{x)\dC'x~''{x\xQ) 



dxi dxi J dxj 



h=0 ^ 



^^ dCP{x\xo)ddt'~''\x\xo) 
dxi dxj 
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(k) 

Note that the computation of each function G-^ requires only the abil- 
ity to differentiate the previously determined coefficients Cj^ , ■ • ■ , C^ 
The same applies to their expansions. The following theorem can now de- 
scribe how the coefficients C^'" , that is, the coefficients I3l , i G Ik, are 
determined. 

Theorem 2. For each k = —1,0,. . . ,K, the coefficient C^ {x\xo) in 
(27) solves the equation 

(38) f^^-'\x\xo) = 0, 
where 

^Qn\ #(-2)/ I N or<(-i)/ I ^ V" ^ ^ dC^x^\x\xo) dc';^^\x\xo) 

(39) /^ \x\xo) = -2C)^ >{x\xo)- }_^Vij{x) , 

(40) /^ '{x\xo) = -}^ Vij{x) Gy{x\xo) 






and, for k>l 
(41) 



ft'\x\xo) = cP{x\xo) 



1 V ,, ^^^ ^gx ^(x|xo)aC),^(x|xo) Mk).,, 



where the functions G\^ , k = 0,l,. . . ,K , are given above. The coefficients 

(k) 

PI solve a system of linear equations, whose solution is explicit. 

Gx involves only the coefficients G^ for h= —l,...,k — l, so this system 
of equations can be utilized to solve recursively for each coefficient, meaning 
that the equation fj^ =0 determines G)^ ; given Cj^ , G)^ becomes 
known and the equation f^ =0 determines Gx , given C^ and Gx , 

Gx becomes known and the equation /x — then determines Gx , and 
so on. 

Each of these equations can be solved explicitly in the form of the expan- 
sion Gx' of the coefficient Gx , at order jk in {x — xq). The coefficients 
/?! (xq), i G Ik, of Gx' are determined by setting the expansion /^ ' " 
of fx to zero. The key feature that makes this problem solvable in closed 
form is that the coefficients solve a succession of systems of linear equations: 
first determine P^ for tr[i] = 0, then P^ for tr[i] = 1 and so on, all the way 
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to tr[i] = jfc. Note, in particular, for k = —1, fil =0 for tr[f] = 0, 1 (i.e., the 
polynomial has no constant or linear terms) and the terms corresponding to 
tr[i] = 2 (with, of course, j_i > 2) are 

iG/_i:tr[i]=2 

= -\{x - Xof V"'^ {xq){x - Xq), 

which are the anticipated terms, given the Gaussian limiting behavior of 
the transition density when A is small. Thus, with j_i > 3, we only need to 

determine the terms j3l corresponding to tr[i] = 3, . . . , j_i. Note that the 

solution I3\ depends only on the specification of the v matrix (the drift 

functions are irrelevant). For k = Q, [5\ ={) for tr[i] = 0, so the polynomial 
has no constant term. For A; > 1, the polynomials have a constant term (for 

fc > 1, /?} / for tr[i] = in general). 

To obtain an expansion for the density px instead of for the log-density 
Ix, one can either take the exponential of l\^ or, alternatively, expand 
the exponential in A to obtain the coefficients cx for the expansion of px 
from the coefficients Cx for the expansion of Ix- In general, like a Hermite 
expansion, neither will integrate to one without division by the integral over 
Sx of the density expansion. Positivity is guaranteed, however, if one simply 
exponentiates the log-transition function. 

5.3. Applying the irreducible method to a reducible diffusion. Theorem 
2 is more general than Theorem 1, in that it does not require that the 
diffusion be reducible. As discussed above, in exchange for that generality, 
the coefficients are available in closed form only in the form of a series 
expansion in x about xq. The following proposition describes the relationship 
between these two methods when Theorem 2 is applied to a diffusion that 
is, in fact, reducible. 

Proposition 2. Suppose that the diffusion X is reducible and let l^ 
denote its log-likelihood expansion calculated by applying Theorem 1. Sup- 

~(K) 

pose, now, that we also calculate its log-likelihood expansion, l^ , without 
first transforming X into the unit diffusion Y , that is, by applying Theorem 
2 to X directly. Then, each coefficient C^'" (x|xo) from l\^ is an expan- 
sion in {x—xq) at order j^ of the coefficient C\^ {x\xq) = Cy(7(x) 17(2:0)) 
from Ij^ ' . 

In other words, applying the irreducible method to a diffusion that is, in 
fact, reducible involves replacing the exact expression for C\\x\xq) by its 
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series in {x — xq). Of course, there is no reason to do so when the diffusion 
is reducible and the transformation 7 from X to Y, defined in Definition 1, 
is exphcit. 

However, Proposition 2 is relevant in the case where the diffusion is re- 
ducible, but the transformation 7 is not available in closed form. This can 
occur even in dimension m = 1, where every diffusion is theoretically re- 
ducible. For instance, consider the specification of the diffusion function in 
the general interest rate model proposed in [1], namely cr'^{x) = 9-ix~^ + 
^0 + ^ix + 62X ■^ , where the 0's denote parameters. In that case, 7(x), given 
in (8), involves integrating l/cr and the result is not explicit. Fortunately, 
one can use the irreducible method in that case and the result of applying 
that method is given by Proposition 2. An alternative is to use the method 
that has been proposed by [5] . 

Finally, the double series in A and (x — xq) produced by the irreducible 
method matches, when applied to a reducible diffusion, the expansion pro- 
duced by the Hermite series since the coefficients of the latter [a polynomial 
in (x — Xq), by construction] are obtained as a series in A by computing their 
conditional expectations, as described in (13). But the infinitesimal genera- 
tor of the process in (13) is by definition, such that the resulting coefficients 
solve, at each order in A, the Kolmogorov equations. Hence, the two series 
match. 

6. Convergence to the true log-likelihood function and the resulting ap- 
proximate MLE. Theorems 1 and 2 give the expressions of the coefficients 
of the expansion in the reducible and irreducible cases, respectively. I now 
turn to the convergence of the resulting expansion to the object of inter- 
est, showing that the series constructed above is a Taylor expansion of the 
true, but unknown, log-likelihood function, and considering its application 
to likelihood inference. 

Suppose that (/x, a) are parametrized using a parameter vector 6 and that 
(/u, a) and their derivatives at all orders are three times continuously differ- 
entiable in 0. The differentiability of the coefficients extends to Ix^ in light of 
the previously cited results on the solutions of second-order parabolic partial 
differential equations (Section 9.6 of Friedman [14]), and to the expansion 
by construction, given that it consists of sums and products of (//, a) and 
their derivatives. Let the parameter space G be a compact subset of W . 
Let ^0 denote the true value of the parameter. Assume, for simplicity, that 
for fixed n and A, 1 — >£„(^,A) defined in (2) has a unique maximizer 
^n,A £ ©• Qn,A is the exact (but incomputable) MLE for 9. Consider, now, 
the approximate MLE ^^ ^ obtained by maximizing Vk '{9,/S.) (resp. In ), 

itself defined analogously to (2), but with the expansion Vx (resp. l^ ) 
in the reducible (resp. irreducible) case instead of the true log-transition 
function Ix- We have the following result. 
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Theorem 3. For any n, 

(42) sup|£W(e,A)-4(^,A)H0 

6*60 

{ fC^ 

in Pg^^ -probability as A ^ 0. In the reducible case, the same holds for £n . 
The approximate MLE sequence 9^ ^ exists almost surely and satisfies ^„ ^ — 
dn,A — > in Pg^ -probability as A — > 0. 

Furthermore, suppose that as — > cxo, we have On. a -^ ^o ^^ Pg^ -probability 
and that there exists a sequence of nonsingular r xr matrices Sn,A such that 

(43) 5-k(^n,A-eo)=Op(l). 

There then exists a sequence A„ -^ such that 

(44) 5-iJ0"fi-4,Aj=Op(l). 

Intuitively, the reason that the log-approximation error (42) is small in 
probability is as follows. For small A, in a small neighborhood about xq, the 
approximation error is small by construction because l\^ (resp. ^3^ ) is a 
Taylor expansion of Ix about A = (and about x = xq, resp.). Away from 
Xq, the approximation error may not be small, unless Ix is analytic, but this 
does not matter much because such an error is at most polynomial, while 
the probability of reaching this region in time A is exponentially small. 

Note, also, that it follows from (43)-(44) that ^^ ^ and 9n,A share the same 
asymptotic distribution as -^ 00. For instance, (43) is verified, in particular, 
if the process X is stationary with positive definite Fisher information matrix 
F/^ for a pair of successive observations, in which case Sn,A can be taken to 

be n~^''^F^ (see Billingsley [6] for the required regularity conditions). 

7. Examples. In this section, I apply the above results to a leading mul- 
tivariate diffusion example. The last example shows that the method of this 
paper also applies to time-inhomogeneous models. 

7.1. The Bivariate Ornstein-Uhlenbeck model. Consider the model 

(45) dXt = /3{a-Xt)dt + adWt, 

where a = [aj]j=i,2) /? = [A]i=i,2 and a = [o'ij]ij=i^2 and assume that /3 and 
a have full rank. This is the most basic model capturing mean reversion in 
the state variables. Consider, now, the matrix equation /9A -|- A/3 = aa , 
whose solution in the bivariate case is the 2x2 symmetric matrix A given 
by 

(46) A = _-^_^(Det[/3]aa^ + (/3 - tr[/3])aa^(/3 - tr[/3])^). 
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When the process is stationary, that is, when the eigenvalues of the matrix 
(3 have positive real parts, A is the stationary variance-covariance matrix 
of the process. That is, the stationary density of X is the bivariate Normal 
density with mean a and variance-covariance A. 

The transition density of X is the bivariate Normal density 

Px{x\xo, A) = (27r)~i Det[rj(A)]"i/2 

(47) ^ 

X exp(— (x — m(A,a;o)) ^ (A)(x - m(A,xo))), 

where r7i(A,xo) = a + exp(— /?A)(xo — a) and r2(A) = A — exp(— /?A)Aexp(— /3-^ 
A), with exp denoting the matrix exponential. 

Identification of the continuous-time parameters from the discrete data 
for this particular model is discussed in Philips [24], Hansen and Sargent [15] 
and Kessler and Rahbek [22] . If we wish to identify the parameters in 9 from 
discrete data sampled at the given time interval A, then we must restrict 
the set of admissible parameter values 0. For instance, we may restrict 
in such a way that the mapping (3 i— > exp(— /3A) is invertible, for instance, 
by restricting the admissible parameter matrices (5 to have real eigenvalues. 
This will be the case, for example, if we restrict attention to matrices (3 
which are triangular (and, of course, have real elements). For the rest of this 
discussion, I will assume that Q has been restricted in such a way. 

By applying Proposition 1, we see that the process X is reducible and 
that 7(x) =cr~^x, so 

(48) dYt = {a'^pa - a'^ l3aYt) dt + dWt = K{r] - Yt) dt + dWt, 

where rj = a~^a = [r]i\i=i^2 and k, = a~^Pa = [Kij]ij=i,2- One can therefore 
apply Theorem 1 to obtain the coefficients of the expansion: 

Cy iy\yo) = -^(yi - yoif - Uy^ ~ 2/o2)^ 
Cy iy\yo) = -^(yi - yoi){iyi + yoi -271)^11 + (y2 + yo2 -272)^12) 

- 5(^2 - ym){{yi + yoi - 271)^21 + (y2 + 2/02 - 272)^22), 

Cy iy\yo) = 1(^11 - ((yoi - ??i)kii + (yo2 - V2)l^l2f) 

+ ^(«;22 - ((yoi - m)f^2i + (yo2 - 112)1^22) ) 

- \{yi - yoi)((yoi - f?i)(Kn + kI^) 

+ (y02 - %)(kiiKi2 + K21K22)) 

+ ii(yi - yoi)^(-4Kii^ + Ki2^ - 2K12K21 - 3^21) 

- \{y2 - yo2)((yoi - ??i)(/^ii/^i2 + ^21^22) 

+ (y02-r/2)(K?2+«22)) 
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+ ^(2/2 - y02)^(-4K22 + l4l - 2k12«^21 " SK^a) 

- |(yi - yoi){y2 - yo2)(KiiKi2 + K21K22), 

+ |(yi ~ yoi)(Ki2 - K2i)((yoi - ?/i)(kiiki2 + K21K22) 

+ (y02-??2)(K?2 + K22)) 
+ ^(yi - ?/0l)^(«;i2 - K2i)(kiiK12 + K21K22) 
+ ^(2/2 - ?/02)^(«^21 - Ki2)(kiiKi2 + K21K22) 
+ g(y2 - 2/02)(k21 - Ki2)((yoi - ??i)(k?i + K21) 

+ (2/02 - ??2)('«11'^12 + «^2lK22)) 
+ n(2/l - y0l)(y2 - y02)(Kl2 - K2i)(k22 + K?2 - «^11 + «^2l)- 

Because this is essentially the only multivariate model with a known 
closed-form density (other than multivariate models which reduce to the 
superposition of univariate processes), the Ornstein-Uhlenbeck process can 
serve as a useful benchmark for examining the accuracy of the expansions 
and the resulting MLE. Table 1 reports the results of 1,000 Monte Carlo 
simulations comparing the distribution of the maximum-likelihood estima- 
tor 0(^^^) based on the exact transition density for this model, around 
the true value of the parameters 0(^"-^^> ^ to the distribution of the differ- 
ence between the MLE 0(^^^) and the approximate MLE 0'^' based on the 
expansion with K = 2 terms shown above. To ensure full identification, the 
off-diagonal term K21 is constrained to be zero. As discussed above, this guar- 
antees that the eigenvalues of the mean reversion matrix are both real and 
avoids the aliasing problem altogether. The constraints kh > and K22 > 
make the process stationary, so standard asymptotics give the asymptotic 



Table 1 
Monte Carlo simulations for the bivariate Ornstein-Uhlenbeck model 





g(TRUE) 


Asymptotic 

^(MLE) _ q(TRUE) 


Small 

^(MLE) _ 


sample 

_ g(TRUE) 


Small sample 

^(MLE) _ g(2) 


Parsimeter 


Mean 


Stdev 


Mean 


Stdev 


Mean 


Stdev 


Vi 








0.065 


-0.0013 


0.066 


-0.0000005 


0.000014 


m 








0.032 


-0.001 


0.033 


-0.0000003 


0.000011 


Kll 


5 





1.03 


0.49 


1.11 


0.012 


0.008 


1^12 


1 





1.51 


0.12 


1.64 


0.010 


0.016 


1^22 


10 





1.44 


0.33 


1.46 


0.068 


0.029 
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distribution of ^C^^E) ^^j^^ inverse of Fisher's information is computed as 

E[-dHx/dede'^]-^). 

Each of the 1,000 samples is a series of n = 500 weekly observations (A = 
1/52), generated using the exact discretization of the process. The results 
in the table show that the difference 9^^^^> — 0^"^) is an order of magnitude 
smaller than the (inescapable) sampling error 0("^^^) — ^(Thue)_ Hence, for 
the purpose of estimating 6^^^^^\ ^'2) can be taken as a useful substitute 
for the (generally incomputable) Qi'^^^i , in other words, at least for this 
model, K = 2 provides sufficient accuracy for the types of situations and 
values of the sampling interval A one typically encounters in finance. 

7.2. Comparing the accuracy of the reducible and irreducible methods. 
Using nonlinear transformations of the Ornstein-Uhlenbeck process, we can 
assess the empirical performance of the general method for irreducible dif- 
fusions. Let Y denote the process given in (48) and define Xt = exp(l^) = 
7™^(1^). From Ito's lemma, the dynamics of Xt are given by 

^ Pit(i + ^ii(r?i-ln(Xit)) + '^i2(r/2-ln(X2t)))\ 

" * \X2t{\ + ^^2i{m-HXit)) + ^22{m-HX2t)))) 

(49) 

By construction, this process has a known log-transition function given by 
Ix{x\xq,/S) = — ln(xa:;o) + /y(A,ln(x)| ln(2;o)) and it is reducible by trans- 
forming Xt back to Yt = \u{Xt) = jiXt), with Dy{x) = ln{XitX2t) for that 
transformation. 

But, in order to assess the accuracy of the irreducible method, we can 
directly calculate the irreducible expansion (based on Theorem 2) for the 
model (49). We can then compare it to the closed-form solution, but also 
to the reducible expansion obtained using the order 2 expansion given in 
the previous section for ly and then the Jacobian formula, lxix\xo,A) = 
— ln{XitX2t) + /y(A,ln(x)| In(xo)). Based on Proposition 2, we know that 
in this situation, the irreducible expansion involves Taylor expanding the 
coefficients of the reducible expansion in x about xq. Monte Carlo sim- 
ulations with the same design as in the previous section help document 
the effect of that further Taylor expansion on the accuracy of the resulting 
MLE. The results are given in Table 2 and they show that the difference 
^(MLE) _ ^{2,irrcducibic)^ although larger than Qi^^^) - ^{2,rcducibie) ^ remains 
smaller than the difference 0^^^^> — ^(^^-UE) ^^^ ^^ ^^iq sampling noise. In 
other words, replacing 0^^^^' by ^(2, irreducible) j^g^g ^^^ effect which is not sta- 
tistically discernible from the sampling variation of 9^ ' around 6^^^^^) . 
And, of course, 0(^^^) is generally incomputable, whereas ^(2. irreducible) jg 
computable. 
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Table 2 

Monte Carlo simulations for the exponential transformation of the Ornstein- Uhlenbeck 

model: comparing the reducible and irreducible methods 





0(TRUE) 


Small 

g(MLE) 


sample 

_ g(TRUE) 


Small sample 

^(MLE) ^(2,reducibl 

Mean Stdev 


Small 

e:g(MLE) _ 


sample 

o(2,irreducib 


Parameter 


Mean 


Stdev 


Mean 




Stdev 


Vi 





-0.0019 


0.065 


-0.000003 


0.00009 


-0.0004 




0.0002 


V2 





-0.003 


0.034 


-0.0000001 


0.00001 


0.0003 




0.0002 


Kll 


5 


0.48 


1.06 


0.012 


0.007 


0.08 




0.04 


1^12 


1 


-0.05 


1.57 


0.0087 


0.016 


0.025 




0.03 


«:22 


10 


0.39 


1.50 


0.069 


0.030 


0.21 




0.08 



7.3. Time-inhomogeneous models. Time-inhomogeneous models are of 
particular interest for the term structure of interest rates. A large swathe 
of the term structure literature has proposed models designed to fit ex- 
actly the current bond prices, as well as other market data, such as bond 
volatilities or the implied volatilities of interest rate caps, for instance. Cali- 
brating such a model to time- varying market data gives rise to time- varying 
drift and diffusion coefficients. Typical examples of this approach include 
the models of Ho and Lee [17], Black, Derman and Toy [7] and Hull and 
White [18], where the short-term interest rate (or its log) follows the dy- 
namics dXit = {c({t) — (3{t)Xit) dt + n{t) dWit- Markovian specializations of 
the Heath, Jarrow and Morton [16] model will also be, in general, time- 
inhomogeneous . 

The univariate results of Ai't-Sahalia [2] have been extended to cover 
such models by Egorov, Li and Xu [13]. With expansions now available 
for time-homogenous diffusions of arbitrary specifications and dimensions, 
a time-inhomogeneous diffusion of dimension m can be simply reduced to a 
time- homogenous diffusion of dimension m + \. Indeed, consider the state 
vector Xt = (^li, ■ • ■ , Xmt)- Now, define time as the additional state variable 
Xm.+i,t = t, whose dynamics are dXm+i,t = dt, and consider the extended 
state vector as Xt = {Xu, . . . , X^t^ Xm+i,t)- This is an (m -|- l)-dimensional, 
time-homogenous, diffusion. 

8. Proofs. 

8.1. Proof of Proposition 1. Suppose that a transformation exists and 
define Yt = ^{Xt), where 7(x) = (7i(x), . . . ,7m(x))-'". By Ito's lemma, the 
diffusion matrix of Y is ayiYt) = Vj{Xt) a{Xt). For ay to be Id, we must 
therefore have that VjiXt) =a~^{x) (recall that a is assumed to be non- 
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singular). Thus, 






(50) 


>-'W^)-W 




and hence 






d[a~%{x) 
dxk 


d /d-fi{x)\_ d rd-fi{x)\_ 

dxk V dxj J dxj V dxk J 


dxj 



for all {i,j,k) = 1, . . . ,7n. Continuity of the second-order partial derivatives 
is required for the order of differentiation to be interchangeable. Here, we 
have infinite differentiability. 

Conversely, suppose that a~^ satisfies (10). Then, for each i = 1, . . . ,m, 
use row i of the matrix a~^, a^, = [[cr~^]ij]j=i,...,m5 to define the differential 
1-form iOi = J2JLiW~^]ij dxj and calculate its differential, the differential 2- 
forni duJi. Condition (10) implies that duJi = 0, that is the differential 1-form 
iOi is closed on Sx- The domain Sx is singly connected (or without holes). 
Therefore, by Poincare's lemma (see, e.g.. Theorem V.8.1 of Edwards [12]), 
the form toi is exact, that is there exists a differential 0-form 7^ such that 
dji = oji. In other words, for each row i of the matrix a~^, there exists a 
function 7^ defined by 7i(x) = j^^ [(7~^]ij(x) dxj (the choice of the index j is 
irrelevant) which satisfies (50), has the required differentiability properties 
and is invertible. The function 7 is then defined by each of its d components 
7i, i = 1, . . . ,m, and because of Assumptions 2 and 3, it is invertible and 
infinitely often differentiable. By construction, Yt = "y{Xt) has unit diffusion 
and therefore X is reducible. To prove the equivalent characterization (9), 
apply Ito's lemma from y to X (instead of from X to Y) and proceed as 
above. 

8.2. Proof of Theorem 1. To show that (15) with the coefficients given 
in the statement of Theorem 1 indeed represent the Taylor expansion in A 
of the log-density function ly, at order K — \, it suffices to verify that the 
difference between the left- and right-hand sides in the Kolmogorov forward 
and backward partial differential equations is of order A^' . 

Define Fy (y|yO)^) [resp. By {y\yQ.,/^)] as the difference between the 
left- and right-hand sides of the forward (resp. backward) equations when 
ly is replaced hy ly ■ The backward equation for ly is 

5ly(y|yo,A) ^ 9/y(y|yo,A) 

OA =l^/^^^(yo) a,o. 

(51) 

- dHy{y\yoA) , l>^/a/y(y|yo,A)^2 



-1 lib 



-2 — + oE( 



2tt dyl 2^V dy 



Oi 
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Substituting in the expansion (15) and equating the coefficients of A ^ on 
both sides of (51) yields 



c^Y'\v\yo) 



l(dC^y^\y\y,)\^(dC'f^\y\y^) 



2\ dy, 



'Oi 



\ dy^ 



OJ 



which is satisfied by the (aheady determined) solution (20), which is there- 
fore the desired solution. 

Starting with the Gaussian leading term (20), we have 



K~l 



F^^\y\y,A)= E AKy\yo)Tv + o{^ 



{k), 



A^ 



,K\ 



k=-l 
K-l 



<^(y|yo,A)= E bP{y\yo)— + 0{/\ 



(k), 



A^' 



.K\ 



k=-l 



k\ 



[with the convention that (— 1)! = 0! = 1]. The first term in Fy is 



/y ^\y\yo) = -^ + ^i^Yi{y) 



.(-1) 



•m Q2r'(~l)^ 



dC)~'>{y\y,) 1 - a^Cf^^(ylyo) 



j=i 



dyi 



-.T.- 



i=l 



dyi 



ly^d&Y^\y\y,)dCP{y\yo) 



i=l 



dyi 



dyi 



Y^{yi - yoi)f^Yi{y) + "^{yi - yoi 



dcP{y\yo) 



i=l 



i=l 



dyi 



.(0), 



Solving the equation /y {y\yo) = for Cy'iylyo) yields the full solution 

m „i 

Cy \y\yo) = y^iy^ ~ yoi) / fj-niyo + uiy - yo)) du 

i=l -^0 



(52) 



+ E 



affl ^--^o- +Mf, 



^J 



*J=1: j¥* 



yj - yoj 



where the aj ■ and My are integration constants in the differential equation 

fy =0, hence arbitrary functions of yo- The boundary condition that Cy 
be finite when passing through the axes yj = yjo for all j = 1, . . . , m imposes 



.(0) 



{K) 



the condition a\- = 0. To determine My , (51) gives 



.(0), 



by {y\yo) = - E(y« ~ yoi)fj'Yiiy) - E(2^« ~ yoO — K^ — 



i=l 



i=l 
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and the candidate solution (52) must satisfy by =0. Thus, we must have 

1^ — H7 — (yi - yoi) = 



r(0) 



5yi 



Oi 



for all y and yo ^'^^ so My (yo) is constant. Since the limiting behavior of 
PY is N{0, 1) as A ^ and 

limJl'^\y\yo,A) + ^H27TA) + ^\\y-yo\\A=C^\y\yo), 



A^O 



r(0) 



we must have My = to ensure that as A ^ 0, the limiting density inte- 
grates to one [otherwise, it integrates to exp(My )]. 
The next term is 



Wi 



j-(o)/ I \ ^(1)/ I ^ , Y^/ ^^Cl.'{y\yo) , ^^ 

fy iy\yo) = C{^ iy\yo) + i^ivi - yoi) — ^ — + 2^ 



j=i 



dyi 



i=l 



dyi 



.(0), 



™ , dC^"{y\yo) 



i=l 



lj^l d'C^\y\yo) ^ 



i=l 



dy 



5Cf(y|yo)^2- 



r<i^)( I ^ , V^^ xSC'y (y|yo) ^{i) , , s 

Cy {ym) + 2^{yi - yoi) — -j^ Gy{y\yo), 



i=l 



.(1) : 



where Gy' is given in (23) and depends on the previously determined Cy 



(-1) 



and G 



(0) 
Y ■ 



Solving the equation fy {y\yo) = 0, which is linear in Cy , similarly yields 
the explicit solution 

C?{y\yo)= f'G?{yo + u{y-yo)\yo)du 



y- (1) yr - yoi 3^(1) 

which includes generic integration constants a\, and My . The solution 



.m 



i{K) 



has a) • = when imposing finiteness of ly when passing through the axes 

yj = yjo for all j = 1, . . . ,m. As for My , invoking the backward equation 
(51) yields 

(1) ^ dM^'\yo) ^ 

My iyo) - 1^ — a (yi - yoi) = o, 
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whose only solution valid for all {y,yo) is My (yo) =0. This yields the 
coefficient Cy ■ 

More generally, the term fy , /c > 1, is given by 

fy iy\yo) = CY (y\yo) + i:2^iyi-yoi) — h] g^ (y|yo), 

K j=i oyi 

where Gy is given in (24) and depends on the previously determined 
C^~^\cP, ..., C^~^\ Solving the equation fP{y\yo) = for C^^ (with 
the same boundary condition as for Cy and Cy ) yields the explicit solution 
(22). In this case, the full solution including generic integration constants 
Uij and My , is 

C^^\y\yo) = k f'GP{yo + u{y-yo)\yo)u''-'du 
Jo 

m 



~<{k) 



■i;j='i-d¥'i 



^'^ iyj-yoj)'^' ^ 



Thus, by construction, the solution Cy , k = —1, 0,. . . ,K given in the state- 
ment of the theorem is such that Fy {y\yQ,A) = 0(A ). Similarly, 

By (y|yO)^) = O(A^). The fact that solving the Kolmogorov equations 
to order A yields a Taylor expansion of order if — 1 of /y is established as 
part of the proof of Theorem 3 below. 

8.3. Proof of Theorem 2. Let F^^ and F^^ (resp. B^^^ and ^$f ^) 
denote the difference between the left- and right-hand sides of (28) [resp. 
(29)] when Ix is replaced by the expansion l)^ (resp. Ix )■ We have 

Ff )(x|xo, A) = Y: /f (x|xo)^ + O(A^) 

k=-2 '^■ 

[with the convention that (—2)! = 2 and (— 1)! = 0! = 1]. The highest-order 
term is fx , given by (39), and the coefficient function Cx is such that 
it sets fx to zero. We have then successively determined C^ by setting 
fx in (40) to zero and, more generally, given C^ , Cj^ , • ■ • , C^ , the 
expression (41) for f^ is defined and can be set to zero to determine the 
next coefficient C^ ■ The form of the log-likelihood adopted in (27) with 
Dy kept separate from Cj^ is essential to obtain B^ (a^l^o. A) = 0(A) in 
addition to ^^^^(xlxo. A) = O(A^). 
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To determine the expansions in x — xq for each coefficient Cx , k> —1, 
replace C^ by C^' in each equation in turn. Starting with (39), calculate 
an expansion of /j^ in (x — xq) to order j_i. This determines a system 
of equations in the unknown coefficients I3\ , i G /-i [which appear when 

Cjf is expanded, as in (34)]. By construction, there are as many equations 
as unknowns (both are given by the number of elements in /_i). This system 
of equations can always be solved explicitly because it has the following form. 
First, I3\ =0 for tr[i] = 0, 1 (i.e., the polynomial has no constant or linear 
terms) and the terms corresponding to tr[i] = 2 (with, of course, j_i > 2) 
are 

^ (5\~^\xo){xi - Xoi)^l ■■■{Xra- ^Om)*™ 

iG/-i:tr[j]=2 

= -{x- XQfv~^{xQ){x - Xo), 

which is the anticipated term, given the Gaussian limiting behavior of the 
transition density when A is small. Thus, with j_i > 3, we only need to 
determine the terms I3\ corresponding to tr[i] =3, ...,j_i. Then, the 
next order coefficients in [x — xq), that is, the coefficients corresponding 
to tr[i] = 3, each appear linearly in a separate equation. That is, we have 

a system $3 (xq) • fi\ {xa) = 03 (2:0) whose explicit solution is given 

by /?3 (xq) = Inv[<l>3 (xq)] • 03 (xq). Given the previously determined 
coefficients corresponding to tr[i] =0, . . . , r, the equations determining the 

coefficients for tr[i] = r + 1 are given by a linear system '^\._^^{xq) ■ I3)._^i {xq) 

= cl\.j^i{xq), where the matrix '^\.j^i and the vector a\.j^i are functions of 

the previously determined coefficients /3J , for tr[i] = 0, . . . , r, and xq. 

The same principle applies to all values of k. For /c = 0, /?j = for 
tr[i] = 0, so the polynomial has no constant term. For /c > 1, the polynomials 

have a constant term (for k>l,l3\ 7^ for tr[i] = 0, in general). The same 

principle applies to each equation in turn: once C^^' is determined, a 

Taylor expansion of (40) determines the coefficients j3\ , i G /q, and so on. 

8. 

Cy (7(x)|7(xo)). By construction (see the proof of Theorem 2), the coeffi- 
cients C^' cire then Taylor expansions of the coefficients C^ (which are 
the solutions of the equations f^ — 0). 

8.5. Proof of Theorem 3. Consider the irreducible case; everything also 
applies to the reducible case, by simply eliminating the arguments relative 



8.4. Proof of Proposition 2. If the diffusion X is reducible, then C^ {x\xo) 
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to the additional expansion in x — xq. The expansion p^ , constructed as 
an expansion in A and (x — xq) of exjp{lj^ ), satisfies the hnear backward 

~ (K) 

equation, but with a remainder term Bj^ : 



2.^^ '^ dxoidxoj 

Indeed, the coefficients C^*" , k = —1, ■ ■ ■ ,K, are constructed in such a 
way that the terms of order A and higher of the right-hand side of (53) 

are zero. The remainder term is of the form Bj^ {x\xo, A) = A p-^ (x|xo. A) 

ijj)^ (x|xo. A) where ip\^ {x\xo, A) is a sum of products of the functions /ij, 

fjj , the coefficients Cx' \ k = —1, . ■ . ,K, and their derivatives. Specificahy, 

ipx (^l^^o,^) = CK fx {x\xo) +o{A), where the functions fx are given 
in Theorem 2 and where ck is a constant. 

The coefficients and their derivatives exhibit at most polynomial growth 
as a result of the explicit expressions given in Sections 4 and 5, combined 
with Assumption 4 on {fj.,v) and their derivatives. Thus, tpx exhibits at 

most polynomial growth and we have if^x {x\xo,A) = 0(1) uniformly for all 
(x, Xq) in a compact subset of the interior of Sx and for all in G, by virtue 
of the continuity of the functions and their derivatives with respect to the 
parameter vector in the compact set Q. 

The solution Px of the approximate PDE with remainder, (53), is an 
approximation of the solution px of the exact PDE without remainder term 

(K) (K) 

due to the following. Writing rx —Px ~Px, it is clear, by linearity of the 

PDE for px, that rx also satisfies equation (53) with the same remainder 

Bx , but now with initial condition fj^ (x|xo. A) — > as A ^ (whereas 

Px and Px both converge to a Dirac mass at xq as A — > 0). The solution 
is given by 

(54) rf\x\xo,A)= [ [ BP{x\z,T)pxiz\xo,A-T)dTdz, 

JSx Jo 

which follows from the facts that (54) produces the correct initial boundary 
behavior and, as can be seen by computing the relevant partial derivatives 
of this expression for fj^ , satisfies (53). The function Bx Px is integrable 
because px has exponentially decaying tails in a neighborhood of A = 
(see below), whereas (px has polynomial growth. It follows that we have 
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f-^ (x|xo, A) = O(A^) uniformly for all {x,xq) in a compact subset of the 
interior of S"^ and for all 6 in O . Let 



Wx '{x\xq,/:^) =SW£>\f\ '{x\xq,/S)\ 
eee 



and consider the expectation 

(55) E0P{Xa\Xo,A)\Xo = xo]= f R^J^\x\xo,A)px{x\xo,A)dx. 

JSx 

In the integral above, divide the region Sx ^ IK™" into two parts — a neigh- 
borhood of xq of the form M = WjXA^Oi — A^''^ci^,XQi + A^/^ca], where ca 
is a sequence of positive numbers such that ca — > oo and A^' ^ca — > 0, and 
the complement Sx\J^ of that neighborhood. 

From above, there exist constants C and M such that \Rx (^^I^^OjA)! < 
MA^ for ah x G AA, so 

(56) / B!^p{x\xo,A)px{x\xo,A)dx = 0{A^^). 
JM 

Outside the neighborhood A/", the approximation error IVyr need not be 
small; however, it is at most polynomial in {x — xq). In a neighborhood of 
A = 0, the tail behavior of the transition function px is driven by the term 
exp(-f ln(A) + C7j'-''"^^(x|xo)A-i), with CJ^'-''"^^(x|xo) = -(l/2)(x - 
x^Yv~^{xq){x - xo) + o(||x - xoP). 

Therefore, to bound /^ \j^ r-^ px dx, one needs to integrate a polynomial 

error term in r)^ , say ||x — xqH'', ^ > 0, against the exponential tails of px- 
This results in integrals of the form (written in the univariate case m = 1, 
for simplicity, and near an infinity boundary) 

A-^/2 / |x-xo|''exp(-(x-xo)V(2Au(xo)))dx 

(57) 

= A'/^ / \z-zo\'exp{-{z-zof/{2v{xo)))dz, 

JcA 

with the change of variable z — zq = {x — xo)/A^'^, and similarly on the 
interval [— oo, — A^'^ca]- Since ca —>■ oo, the above integral converges to zero. 
It follows from (56) and (57) that E[R^P{Xa\Xq,A)\Xo]^0 as A ^ 0. 
Similar calculations show that this is also the case for Yar[R^ {X/\\Xq, A)\Xq\ 
Therefore, R^ (XaI-'^O) A) — > in Pgo'Pi^obability, given Xq, which follows 
by Chebyshev's inequality from the convergence to zero of the conditional 
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~ (K) 

expected value and variance of R)^ . Convergence to zero of the condi- 
tional probability implies convergence to zero of the unconditional probabil- 
ity, since 

Pii0P{Xt+A\Xt,A)\>e) 

= [ FT:0p{Xt+A\Xt,A)\>e\Xt = xo)TTtixo)dxo, 
JSx 

where irtixo) denotes the marginal density of X at time t. Since proba- 
bilities are between and 1 and since nt integrates to 1, the convergence 
of the unconditional probability follows by Lebesgue's dominated conver- 
gence theorem. Next, the convergence in Pe^ -probability for the log-density 

~ (K) 

follows from the continuity of the logarithm and the convergence of R\^ . 
Then, for fixed n, the convergence stated in (42) follows from that of l^ 
to Ix- From the assumed existence of the maximizer 9n,A of £n{0,A) in Q 
and the (just obtained) proximity of the two objective functions, it follows 

from standard arguments that the maximizer of in {9, A), that is, 6*^^, 

exists almost surely, is in and is close to 9n,A as A — > 0, in the sense that 

o^ ^ — 9n,A — > in Pgg -probability. Finally, the speed at which ^^ ^ — 9n,A 
converges to zero can be made arbitrarily small for any n by taking A ^ 
sufficiently fast. In particular, as — > oo, a sequence A„ — > can be taken to 
be such that (44) is satisfied. 

9. Conclusions. This paper provides a method to derive closed-form Tay- 
lor expansions to the likelihood function of arbitrary multivariate diffusions. 
While these expansions are local in nature, at least in the irreducible case, 
they have been shown to produce useful approximations in the context 
of maximum likelihood estimation. Ai't-Sahalia and Kimmel [3] apply this 
method to popular stochastic volatility models. Likelihood expansions for 
these models are derived, as well as a Monte Carlo investigation of the prop- 
erties of maximum likelihood estimators of the parameters computed from 
these expansions. 

Once the expansion is computed for the diffusion model at hand, it can 
be applied to the estimation of parameters by a variety of other estimation 
methods which require an expression for the transition density of the state 
variables, such as Bayesian methods where one wishes to obtain a posterior 
distribution for the parameters of a stochastic differential equation, or to 
generate simulated data at the desired frequency from the continuous-time 
model or to serve as the instrumental or auxiliary model in indirect infer- 
ence and simulated or efficient moments methods. The explicit nature of the 
expansion as a function of all of the relevant variables makes these compu- 
tations, whether maximization of the classical likelihood or computation of 
posterior distributions, straightforward and computationally very efficient. 
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Other methods can be used to approximate the transition function: they 
involve numerically solving a partial differential equation, simulating the 
process to Monte Carlo integrate the transition density or approximating 
the process with binomial trees. However, none of these alternative methods 
provides closed-form formulae. Jensen and Poulsen [20], Stramer and Yan 
[26] and Hurn, Jeisman and Lindsay [19] compare the accuracy and speed of 
the different methods, showing the advantages of this closed-form approach. 
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